This document describes fitting a hierarchical ordination to data, including code and the nasty details that are needed.
The data is included in the ‘mvabund’ ‘R’-package, but was originally collected by Gibb et al. It is the observed abundances of 41 ant species at 30 sites in Australia, along with 7 environmental variables, and 5 traits. The ecological question is how do the environmental variables and traits affect the community composition, including whether they interact. The methodological question is how does hierarchical ordination help.
The abundances, traits and environment are each stored in a different matrix. First we load the data and set some constants:
library(nimble)
library(nimbleHMC)
library(mvabund)
library(coda)
data("antTraits")
Y <- antTraits$abund # abundance data of sites by species
X <- scale(antTraits$env) # environmental variables of sites by predictors
TR <- scale(model.matrix(~0+.,antTraits$traits)) # species traits
NSites <- nrow(Y) # number of sites
NSpecies <- ncol(Y) # number of species
NTraits <- ncol(TR) # number of traits
NEnv <- ncol(X) # number of environmental predictors
# Create data lists for Nimble
dat <- list(Y = Y, X = X, TR = TR)
consts <- list(NSites = NSites, NEnv = NEnv, NTraits = NTraits, NSpecies = NSpecies)
The data are counts of each species so we assume they follow a Poisson distribution with a log link function, as we would do in a standard genralised linear model. We assume each species has the same mean abundance (i.e. a different \(\beta_{0j}\) for each species \(j\)), and model the rest of the variation with the hierarchical ordination. This gives the following model for the log of the mean abundance (\(\eta_{ij}\)):
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]
As with a normal ordination, \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are the site scores and species loadings. But here we assume that they each have a variance of 1, so that \(\boldsymbol{\Sigma}\) is the variance of the latent variables: it will typically be a diagonal matrix, and if any of the terms on the diagonal are close to 0, this suggests that latent variable has (almost) no effect. It thus provides a straightforward summary of the importance of that latent variable.
If we stopped here, this would be a standard GLLVM. But we want to model both \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\), i.e. go from modelling model groups of species responding in similar ways to sites to modelling how the traits of the species affect their responses to the environment.
As a simplification, we can think about this as simply a regression of \(\boldsymbol{z}\) (the site effects) against the environmental variables and \(\boldsymbol{\gamma}_j\) against the traits. In reality, it is more complicated, because \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are estimated in the model, so their uncertainty needs to be included.
We denote the abundance at site \(i = 1 \dots n\) for species \(j = 1 \dots p\) as \(Y_{ij}\), or as a matrix as \(\boldsymbol{Y}\). The environmental predictors are \(X_{ik}\) for the \(k = 1 \ldots K\) predictors (or \(\boldsymbol{X}\) as a matrix), and \(t = 1\ldots T\) traits as \(\boldsymbol{R}\). We then assume
\[Y_{ij} \sim Pois(\lambda_{ij}),\]
with
\[log(\lambda_{ij}) = \eta_{ij}.\]
So that \(\eta_{ij}\) is the linear predictor, which we will model with the hierarchical ordination:
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j \]
We then model \(\boldsymbol{z}_i\) (as in van der Veen et al (2023)) and \(\boldsymbol{\gamma}_j\) hierarchically:
\[ \boldsymbol{z}_i = \boldsymbol{B}^\top\boldsymbol{x}_i + \boldsymbol{\epsilon}_i \] and
\[ \boldsymbol{\gamma}_j = \boldsymbol{\omega}^\top\boldsymbol{R}_{j} + \boldsymbol{\varepsilon}_j \] where
We fit the model with the \(\texttt{Nimble}\) package. We start with a single dimension for simplicity, so that we can show the steps needed.
The fitting process needs the mode, and the code to run it.
OneLV.nim <- nimbleCode({
for (i in 1:NSites) {
for (j in 1:NSpecies) {
# These three lines specify the model:
Y[i, j] ~ dpois(lambda[i, j]) # Likelihood: Y follows a Poisson distribution
log(lambda[i, j]) <- eta[i, j] # Specify the log link function
eta[i,j] <- beta0[j] + gamma[j]*sd.LV*z[i] # linear predictor: species + LV
}
# Site scores: regression against environmental effects
# the Bs are the coefficients of the environmental effects
epsilonSTAR[i] ~ dnorm(0, sd = sd.Site) # Residual site effect
XB[i] <- inprod(X[i, 1:NEnv],BSTAR[1:NEnv])
zSTAR[i] <- XB[i] + epsilonSTAR[i]
}
for(j in 1:NSpecies) {
# Species effects: regression against trait effects
# The Os are the trait effects.
varepsilonSTAR[j] ~ dnorm(0, sd = sd.Species) # Residual
omegaTR[j] <- inprod(TR[j, 1:NTraits],OSTAR[1:NTraits])
gammaSTAR[j] <- omegaTR[j] + varepsilonSTAR[j]
beta0[j] ~ dnorm(0, sd = 1) # Species means
}
# Here we standardise z and gamma, so the both have a variance of 1.
# The standard deviation of the latent variable is sd.LV.
zmu <- mean(zSTAR[1:NSites])
gammamu<- mean(gammaSTAR[1:NSpecies])
StdDev.z <- sqrt(mean((zSTAR[1:NSites] - zmu)^2))
StdDev.gamma <- sqrt(mean((gammaSTAR[1:NSpecies] - gammamu)^2))
z[1:NSites] <- (zSTAR[1:NSites]-zmu)/StdDev.z
gamma[1:NSpecies] <- (gammaSTAR[1:NSpecies]-gammamu)/StdDev.gamma
# Scale other parameters
epsmu <- mean(epsilonSTAR[1:NSites])
varepsmu <- mean(varepsilonSTAR[1:NSpecies])
epsilon[1:NSites] <- (epsilonSTAR[1:NSites]-epsmu)/StdDev.z
B[1:NEnv] <- BSTAR[1:NEnv]/StdDev.z
varepsilon[1:NSpecies] <- (varepsilonSTAR[1:NSpecies]-varepsmu)/StdDev.gamma
O[1:NTraits] <- OSTAR[1:NTraits]/StdDev.gamma
# priors for scales
sd.Site ~ dexp(1)
sd.Species ~ dexp(1)
sd.LV ~ dexp(1)
for(k in 1:NEnv){
BSTAR[k] ~ dnorm(0, sd = 1)
}
for(tr in 1:NTraits){
OSTAR[tr] ~ dnorm(0, sd = 1)
}
})
inits<-function(consts){
BSTAR = rnorm(consts$NEnv)
OSTAR = rnorm(consts$NTraits)
varepsilonSTAR = rnorm(consts$NSpecies)
epsilonSTAR = rnorm(consts$NSites)
list(
BSTAR = BSTAR,
OSTAR = OSTAR,
epsilonSTAR = epsilonSTAR,
varepsilonSTAR = varepsilonSTAR,
sd.Site = rexp(1),
sd.Species = rexp(1),
beta0 = rnorm(consts$NSpecies),
sd.LV = rexp(1)
)
}
We parallelise running the MCMC so that each chain is run on a different processor. This means we need a function to run one chain, and then another to run all of the chains together.
# Function to run one chain: it can be done with HMC or other MCMC algorithms.
RunOneChain <- function(seed, dat, code, inits, consts, ToMonitor=NULL,
Nburn=5e3, NIter=5.5e4, Nthin=10, HMC = FALSE, ...) {
require(nimble)
require(nimbleHMC)
AllSamplers <-c('epsilonSTAR', 'varepsilonSTAR', 'beta0', 'OSTAR', 'BSTAR',
'sd.Site', 'sd.Species', 'sd.LV')
HMCsamplers <-c('epsilonSTAR', 'varepsilonSTAR','BSTAR', 'OSTAR', 'beta0',
'sd.Site', 'sd.Species', 'LV', 'sd.LV')
if(is.null(ToMonitor)) {
ToMonitor <- c("beta0", "sd.Species", "sd.Site", "sd.LV", "B", "O",
"epsilon", "varepsilon", "gamma", "z")
}
# Mons <- c("beta0","sd.Species","sd.Site","sd.LV","B","OSTAR", "epsilonSTAR","varepsilonSTAR")
mod <- nimbleModel(code = code, name = "HO", constants = consts,
inits = inits(consts), data = dat, buildDerivs = TRUE)
model <- compileNimble(mod)
if(HMC) { # Do HMC on everything
conf <- configureHMC(model, monitors = ToMonitor, print = FALSE,
control=list(nwarmup=Nburn, kappa=0.95))
# Use a slice everything that not being HMCed
Slicesamplers <-AllSamplers[!AllSamplers%in%HMCsamplers]
if(length(Slicesamplers)>1) {
conf$removeSamplers(Slicesamplers)
sapply(Slicesamplers, conf$addSampler, type = "AF_slice")
}
} else {
conf <- configureMCMC(model, monitors = ToMonitor, print = FALSE)
# Change the samplers to block update, using a nifty extension of the slice sampler
Slicesamplers <-c('epsilon', 'varepsilon')
conf$removeSamplers(Slicesamplers)
conf$addSampler(target = Slicesamplers, type = "AF_slice")
}
# nimbleOptions(MCMCusePredictiveDependenciesInCalculations = FALSE)
mcmc <- buildMCMC(conf)
cmcmc <- compileNimble(mcmc, project = model)
res <- runMCMC(cmcmc, niter=NIter, nburnin = Nburn, thin=Nthin,
nchains = 1, samplesAsCodaMCMC = TRUE, ...)
return(res)
}
# Function to run parallel chains
ParaNimble <- function(NChains, ...) {
require(parallel)
nimble_cluster <- makeCluster(4)
samples <- parLapply(cl = nimble_cluster, X = 1:4,...)
stopCluster(nimble_cluster)
# Name the chains in the list
chains <- setNames(samples,paste0("chain", 1:length(samples)))
chains
}
These models are notorious for being unidentifiable, i.e. you can get the same mean abundances from different combinations of the parameters. We therefore have to make some adjustments to the model: some of this is done in the model fitting, but for others it is easier to do it afterwards.
The functions to swap signs are below. We want to make one parameter positive, so ideally we want to do this to a parameter that doesn’t want to swap signs. We identify this in an ad hoc way: for each chain for every parameter we calculate the proportion of iterations where the sign is positive, and then for every parameter we calculate the variance in the proportion. If a parameter is centred around 0, then the mean proportion will be about 0.5, and will not vary much, whereas if it is some way from 0 the mean will be close to 0 or 1, and the variance will be high.
An alternative approach would be to look at the largest \(|p - 0.5|\). There may be better alternatives too.
# Utility Function to get logical indicators for if names contains a string in v
GetInds <- function(v, names) {
if(length(v)==1) {
res <- grep(v, names)
} else {
res <- c(unlist(sapply(v, grep, x=names)))
}
res
}
# Function to swap signs of all variables varinds to have same sign as vartosign.
# Might have to change this to make sure different chains stick to the same sign
ReSignChain <- function(chain, varinds, vartosign) {
# MeanSign <- sign(mean(chain[ ,s])) # Might need this
res <- t(apply(chain, 1, function(x, vs, vi) {
Names <- names(x)[vi]
if(any(grepl(",", Names))) {
lvind <- gsub(".*, ", "", Names)
} else {
lvind <- seq_along(vs)
}
x[vi] <- x[vi]*sign(x[vs[lvind]])
x
}, vs=vartosign, vi=varinds))
as.mcmc(res)
}
# Function to post-process chains to swap signs.
postProcess <- function(Chains, VarsToProcess, VarToSign=NULL, print=FALSE) {
if(is.null(VarToSign)) VarToSign <- VarsToProcess
SignInd <- GetInds(VarToSign, colnames(Chains[[1]]))
# Get indicators for all variables to have their signs changed
ProcessInds <- GetInds(VarsToProcess, names = colnames(Chains[[1]]))
# Check if > 1 LV
SeveralLVs <- any(grepl(",", colnames(Chains[[1]])[SignInd]))
# Calculate variance of mean of indicator of sign:
# hopefully largest is variable with most sign swapping (i.e. )
Signs <- as.data.frame(lapply(Chains, function(mat, ind) {
colMeans(mat[,ind]>0)
}, ind=SignInd))
VarSign <- apply(Signs, 1, var)
if(SeveralLVs) {
LV <- gsub(".*, ", "", colnames(Chains[[1]])[SignInd])
#Chose variables who's sign will be used to swap other signs
VarsToSwapBy <- sapply(unique(LV), function(lv, vs) {
vv <- vs[grep(lv, names(vs))]
nm <- names(which(vv==max(vv)))
if(length(nm)>1) nm <- nm[1] # probably something more sophisticated is better
nm
}, vs = VarSign, simplify = TRUE)
chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarsToSwapBy, varinds=ProcessInds)
} else { # only 1 LV
#Chose variables who's sign will be used to swap other signs
VarToSwapBy <- names(which(VarSign==max(VarSign)))[1]
if(print) message(paste0("Swapping by ", VarToSwapBy))
# chains.sgn <- lapply(Chains, ReSignChain, varNames=VarsToProcess, sign.var=ind)
chains.sgn <- lapply(Chains, ReSignChain, vartosign=VarToSwapBy, varinds=ProcessInds)
}
as.mcmc.list(chains.sgn)
}
Finally, we can run the MCMC.
chains.unsgn <- ParaNimble(4, fun = RunOneChain,
dat = dat,
code = OneLV.nim,
inits = inits, HMC = TRUE,
# Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
Nburn=1e3, NIter=2e3, Nthin=1, # for a big HMC run
consts = consts)
# post-process chains for sign-swapping
VarsToProcess <- c("^B", "^O", "^epsilon", "^varepsilon", "^z", "^gamma")
VarsToSign <- c("^B")
chains <- postProcess(chains.unsgn, VarsToProcess=VarsToProcess,
VarToSign = VarsToSign, print = TRUE)
summ.HO = summary(chains)
There are a lot of parameters, so a lot of results to look at. Here we concentrate on \(\boldsymbol{B}\), \(\boldsymbol{\omega}\), \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\), and a latent variable-plot.
First the environmental effects
library(basicMCMCplots)
chainsPlot(chains, var = c("B"), legend = F,traceplot=T)
Then the trait effects
chainsPlot(chains, var = c("O"), legend = F,traceplot=TRUE)
These look quite OK (after post-processing). Now we can create a plot of the LV-scores against their indices to inspect the results.
PlotPost <- function(var, summ, varnames=NULL, ...) {
vars <- grep(var, rownames(summ$statistics))
if(is.null(varnames)) varnames <- rownames(summ$statistics)[grep(var, rownames(summ$statistics))]
if(length(varnames)!=length(vars))
stop(paste0("Number of variable names, ", length(varnames), "not the same as number of variables, ",
length(vars)))
plot(summ$statistics[vars,"Mean"], 1:length(vars),
xlim=range(summ$quantiles[vars,]), yaxt="n", ...)
segments(summ$quantiles[vars,"2.5%"], 1:length(vars), summ$quantiles[vars,"97.5%"], 1:length(vars))
segments(summ$quantiles[vars,"25%"], 1:length(vars), summ$quantiles[vars,"75%"], 1:length(vars), lwd=3)
abline(v=0, lty=3)
axis(2, at=1:length(vars), labels=varnames, las=1)
}
par(mar=c(4.1,4.5,1,1))
PlotPost(var="B", summ=summ.HO, varnames = gsub("\\.", "\n", colnames(X)),
xlab="Environmental Effect", ylab="")
We can see that both Coarse woody debris and canopy cover have positive effects on the latent variable, so ants that have a positive species score will be more abundant in areas with more canopy cover and CWD.
We can also look at the trait effects. The bottom line is that there is too much uncertainty to say anything.
par(mar=c(4.1, 7,1,1))
PlotPost(var="O", summ=summ.HO, varnames = gsub("\\.", "\n", colnames(TR)),
xlab="Trait Effect", ylab="")
We can also calculate the full distributions of site and species scores, and plot them.
ExtractScore <- function(mcmc.lst, betaname, epsname, XX, summ=FALSE,
scorename="score", add=TRUE) {
betaind <- grep(paste0('^', betaname), colnames(mcmc.lst[[1]]))
epsind <- grep(paste0('^', epsname), colnames(mcmc.lst[[1]]))
CalcScore <- function(v, betaind, epsind, XX, scorename) {
res <- XX%*%v[betaind] + v[epsind]
# names(res) <- names(v[epsind])
names(res) <- gsub(".*\\[", paste0(scorename, "["), names(v[epsind]))
res
}
PostScore <- function(mc, LV=0, scorename, ...) {
if(ncol(XX)==length(betaind)) {
as.mcmc(t(apply(mc, 1, CalcScore, scorename=scorename, ...)))
} else {
LV.beta <- gsub(".*, |]", "", colnames(mc)[betaind])
LV.eps <- gsub(".*, |]", "", colnames(mc)[epsind])
LVs <- unique(LV.beta)
res <- sapply(LVs, function(lv, mc, betaind, epsind, l.be, l.e, XX, scorename) {
bInd <- betaind[l.be==lv]
eInd <- epsind[l.e==lv]
apply(mc, 1, CalcScore, betaind=bInd, epsind=eInd, XX=XX, scorename=scorename)
}, mc = mcmc.lst[[1]], betaind=betaind, epsind=epsind, l.be=LV.beta,
l.e=LV.eps, XX=XX, scorename=scorename, simplify=FALSE)
res.mcmc <- as.mcmc(t(do.call(rbind, res)))
res.mcmc
}
}
# score <- as.mcmc.list(lapply(mcmc.lst, PostScore, betaind=betaind, epsind=epsind, XX=XX))
if(add) {
score <- as.mcmc.list(lapply(mcmc.lst, function(lst, scorename, ...) {
# ps <- PostScore(lst, betaind=betaind, epsind=epsind, XX=XX)
ps <- PostScore(lst, scorename=scorename, ...)
as.mcmc(cbind(lst, ps))
}, betaind=betaind, epsind=epsind, XX=XX, scorename=scorename))
} else {
score <- as.mcmc.list(lapply(mcmc.lst, PostScore, betaind=betaind,
epsind=epsind, XX=XX, scorename=scorename))
}
if(summ) score <- summary(score)
score
}
We see that, well, there is variation, so the trait effects are not uncertain because the species scores are uncertain: it is because their effects are small.
par(mfrow=c(2,1), mar=c(2,12,4,1))
PlotPost("^z", summ=summ.HO, varnames = NULL, ylab="", xlab="Latent variable 1", main = "Sites")
PlotPost("^gamma", summ=summ.HO, varnames = NULL, ylab="", xlab="Latent variable 1", main = "Species")
More than one dimension requires adding extra identifiability constraints. Now we have two or more latent variables, we also have to worry about rotating them.
There are various ways to place the constraints, some more numerically stable than others. Generally, we note that the model on the link scale is similar to existing matrix decompositions, so that much on the required constraints can be learned from those.
To prevent rotational invariance, we require \(\boldsymbol{B}\) and \(\boldsymbol{\omega}\) to have zeros above the main diagonal. van der Veen et al (2023) instead assumed \(\boldsymbol{B}\) to be a semi-orthogonal matrix as they encountered numerical instability with the constraint that we impose here. However, their constraint would require sampling on constrained parameter spaces, which is difficult, while this constraint formulation allows to use standard out-of-the-box Markov Chain Monte carlo samplers (also, it seems to work fine here). Note, that when $d p,t $ the model is again rotationally invariant, and additional constraints (e.g., order constraints for the latent variables as in a varimax rotation) will need to be added.# Update our constants from before with the new number of LVs, rest remains the same
# Model code
HO.nim <- nimbleCode({
for (i in 1:NSites) {
for (j in 1:NSpecies) {
eta[i,j] <- beta0[j] + sum(gamma[j,1:nLVs]*sd.LV[1:nLVs]*z[i,1:nLVs])
log(lambda[i, j]) <- eta[i, j]
Y[i, j] ~ dpois(lambda[i, j])
}
for (l in 1:nLVs) {
XB[i, l] <- sum(X[i, 1:NEnv]*BSTAR[1:NEnv, l])
epsilonSTAR[i,l] ~ dnorm(0,sd.Site[l])#Residual
zSTAR[i,l] <- XB[i,l] + epsilonSTAR[i,l]
}
}
for(j in 1:NSpecies) {
for (l in 1:nLVs) {
omegaTR[j, l] <- sum(TR[j, 1:NTraits]*OSTAR[1:NTraits, l])
varepsilonSTAR[j,l] ~ dnorm(0,sd.Species[l]) # Residual
gammaSTAR[j,l] <- omegaTR[j,l] + varepsilonSTAR[j,l]
}
beta0[j] ~ dnorm(0, sd=1)
}
# Constraints to 0 on upper diagonal
# stole some code from Boral for this - thanks Francis
for(l in 1:(nLVs-1)) {
for(ll in (l+1):(nLVs)) {
BSTAR[l,ll] <- 0
}
}
for(l in 1:nLVs) {
# diagonal elements
BSTAR[l,l] ~ dnorm(0,sd=1)
## standardizing z and gamma
zmu[l] <- mean(zSTAR[1:NSites,l])
StdDev.z[l] <- sqrt(mean((zSTAR[1:NSites,l] - zmu[l])^2))
z[1:NSites,l] <- (zSTAR[1:NSites,l]-zmu[l])/StdDev.z[l]
gammamu[l] <- mean(gammaSTAR[1:NSpecies,l])
StdDev.gamma[l] <- sqrt(mean((gammaSTAR[1:NSpecies,l] - gammamu[l])^2))
gamma[1:NSpecies,l] <- (gammaSTAR[1:NSpecies,l]-gammamu[l])/StdDev.gamma[l]
# Scale other parameters
epsmu[l] <- mean(epsilonSTAR[1:NSites,l])
varepsmu[l] <- mean(varepsilonSTAR[1:NSpecies,l])
epsilon[1:NSites,l] <- (epsilonSTAR[1:NSites,l]-epsmu[l])/StdDev.z[l]
B[1:NEnv,l] <- BSTAR[1:NEnv,l]/StdDev.z[l]
varepsilon[1:NSpecies,l] <- (varepsilonSTAR[1:NSpecies,l]-varepsmu[l])/StdDev.gamma[l]
O[1:NTraits,l] <- OSTAR[1:NTraits,l]/StdDev.gamma[l]
# priors for scales
sd.Site[l] ~ dexp(1)
sd.Species[l] ~ dexp(1)
sd.LV[l] ~ dexp(1)
}
## Free lower diagonals
for(l in 2:nLVs) {
for(ll in 1:(l-1)) {
BSTAR[l,ll] ~ dnorm(0,sd=1)
}
}
for(l in (nLVs+1):NEnv) {
for(ll in 1:(nLVs)) {
BSTAR[l,ll] ~dnorm(0,1)
}
} ## All other elements
# Prior for trait effects
for(tr in 1:NTraits){
for(l in 1:nLVs){
OSTAR[tr,l] ~dnorm(0,1)
}
}
})
inits<-function(consts){
BSTAR = matrix(rnorm(consts$nLVs*consts$NEnv),ncol=consts$nLVs)
BSTAR[upper.tri(BSTAR)] = 0
OSTAR = matrix(rnorm(consts$nLVs*consts$NTraits),nrow=consts$NTraits)
varepsilonSTAR = mvtnorm::rmvnorm(consts$NSpecies,rep(0,consts$nLVs),diag(consts$nLVs))
epsilonSTAR = mvtnorm::rmvnorm(consts$NSites,rep(0,consts$nLVs),diag(consts$nLVs))
list(
BSTAR = BSTAR,
OSTAR = OSTAR,
epsilonSTAR = epsilonSTAR,
varepsilonSTAR = varepsilonSTAR,
sd.Site = rexp(consts$nLVs),
sd.Species = rexp(consts$nLVs),
beta0 = rnorm(consts$NSpecies),
sd.LV = rexp(consts$nLVs)
)
}
Our model is
\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \]
But we can rotate by a rotation matrix \(M\) this to get an equivalent model:
\[ \eta_{ij} = \beta_{0j} + M \boldsymbol{z}_j^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_i. \]
We do this by rotating \(\boldsymbol{z}_j^\top \boldsymbol{\Sigma}\).
The covariance matrix is now \(M \Sigma^2 M^T\).
\[ \begin{align} Var(M \gamma^T \Sigma z) &= E((M z^T \Sigma \gamma)(\gamma \Sigma z^T M)^T)) \\ &= E(M \gamma^T \Sigma zz^T \Sigma^T \gamma M^T) = E(M \gamma^T \Sigma \Sigma^T \gamma M^T) \\ &= \Sigma^2E(M \gamma^T \gamma M^T) = \Sigma^2E(M M^T) \end{align} \]
There are many possible rotations (see the https://cran.r-project.org/web/packages/GPArotation/index.html package for a long list). Here we will just the the varimax rotation, which rotates the ordination so that most of the variation in \(\boldsymbol{\gamma}_j^\top \boldsymbol{\Sigma}\) is put on the first axis, then the second axis has the next largest amount of variance etc.
In detail, for each draw from the posterior we:
The final two steps sweep the variance of \(M \gamma \Sigma z\) into \(\Sigma\), so it still represents the total variance of the latent variables. (NOTE: I AM NOT SURE THIS IS RIGHT)
# Function to convert a variable in an MCMC chain that should be a matrix from
# a vector to the right matrix
ChainToMatrix <- function(ch, name) {
v <- ch[grep(paste0("^", name, "\\["), names(ch))]
l.row <- as.numeric(gsub(".*\\[", "", gsub(",.*", "", names(v))))
l.col <- as.numeric(gsub(".*, ", "", gsub("\\]", "", names(v))))
mat <- matrix(0, nrow=max(l.row), ncol=max(l.col))
for(i in seq_along(ch)) { # there must be a quicker way...
mat[l.row[i], l.col[i]] <- v[i]
}
mat
}
# Function to convert a variable in an MCMC chain that should be a vector to a diagonal matrix
ChainToDiag <- function(ch, name) {
v <- ch[grep(paste0("^", name, "\\["), names(ch))]
mat <- diag(v)
mat
}
# Function to rotate a latent variable, defaults to Varimax. Methods in GPArotation package
# ch: one draw from MCMC chain
# RotateBy: name of variable to rotate by (i.e. to calculate the rotation matrix for)
# SDby: name of Standard deviations
# ToRotate: Variables to rotate (e.g. epsilons, gamma/z).
# SDToRotate: Standatd devations to rotate
# rotation.fn: Function in GPArotation to calcualte rotation. Defaults to Varimax
# scale: Should RotateBy be scaled by SDby before rotating? Defaults to FALSE
rotate2LV <- function(ch, RotateBy, SDby, ToRotate=NULL, SDToRotate=NULL, rotation.fn="Varimax",
scale=FALSE) {
require(GPArotation)
# Function to rotate matrices & return full vector or a matrix (retmat)
RotateVar <- function(name, vec, rotmat, retmat=FALSE) {
mat <- ChainToMatrix(ch=vec, name=name)
rotated <- mat%*%rotmat
if(retmat) {
res <- rotated
} else {
vec[grep(paste0("^", name, "\\["), names(vec))] <- c(rotated)
res <- vec
}
res
}
# Function to rotate (diagonal) matrices & return the diagonal
RotateSD <- function(name, vec, rotmat) {
mat <- diag(vec[grep(paste0("^", name, "\\["), names(vec))])
rotated <- mat%*%rotmat
vec[grep(paste0("^", name, "\\["), names(vec))] <- diag(rotated)
vec
}
#Extract standard deviartions and latent variables
SDs <- ch[grep(SDby, names(ch))]
LVmat <- ChainToMatrix(ch, RotateBy)
if(scale) LVmat <- sweep(LVmat, 2, SDs, "*")
# Get rotation
loadings.r <- do.call(rotation.fn, list(A=LVmat)) # rotate
# Rotate variables
for(nm in unique(c(RotateBy, ToRotate))) {
ch <- RotateVar(vec=ch, name=nm, rotmat=loadings.r$Th, retmat = FALSE)
}
for(nm in unique(c(SDby, SDToRotate))) {
ch <- RotateSD(vec=ch, name=nm, rotmat=loadings.r$Th)
}
ch
}
RotateChains <- function(mcmc.lst, RotateBy="z", SDby = "sd.LV",
ToRotate=c("B", "epsilon", "gamma", "z", "O", "varepsilon"),
SDToRotate=c("sd.Site", "sd.Species"), ...) {
rotated <- lapply(mcmc.lst, function(mcmc) {
rot <- apply(mcmc, 1, rotate2LV, RotateBy=RotateBy, SDby = SDby,
ToRotate=ToRotate, SDToRotate=SDToRotate, ...)
as.mcmc(t(rot))
})
as.mcmc.list(rotated)
}
RescaleVars <- function(vec, ScaleBy, SDTorescale, ToRescale) {
vec2 <- vec
ScBy <- ChainToMatrix(vec, ScaleBy)
SD <- apply(ScBy, 2, sd)
for(nm in unique(c(ScaleBy, ToRescale))) {
Sc.x <- ChainToMatrix(vec, nm)
Sc.x.sc <- sweep(Sc.x, 2, SD, "/")
vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- c(Sc.x.sc)
}
for(nm in SDTorescale) {
SD.x <- vec[grep(paste0("^", nm, "\\["), names(vec))]
vec2[grep(paste0("^", nm, "\\["), names(vec2))] <- SD.x*SD
}
vec2
}
RescaleChains <- function(mcmc.lst, ...) {
rescale <- lapply(mcmc.lst, function(mcmc) {
rot <- apply(mcmc, 1, RescaleVars, ...)
as.mcmc(t(rot))
})
as.mcmc.list(rescale)
}
We can run the fitting with the same code as before:
consts$nLVs <- nLVs <- 2
HO.2LV <- ParaNimble(4, fun = RunOneChain,
dat = dat,
code = HO.nim,
inits = inits, HMC=TRUE,
# Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
Nburn=1e3, NIter=3e3, Nthin=1, # for a full run
consts = consts)
# Varimax rotation
chains2LV.r <- RotateChains(HO.2LV, rotation.fn="Varimax")
## Loading required package: GPArotation
# Re-scale LVs etc.
chains2LV.r2 <- RescaleChains(chains2LV.r, ScaleBy="gamma",
SDTorescale=c("sd.LV", "sd.Species"),
ToRescale=c("gamma", "O", "varepsilon"))
chains2LV.r3 <- RescaleChains(chains2LV.r2, ScaleBy="z",
SDTorescale=c("sd.LV", "sd.Site"),
ToRescale=c("z", "B", "epsilon"))
# post-process chains for sign-swapping
chains2LV <- postProcess(chains2LV.r3, VarsToProcess=VarsToProcess,
VarToSign = VarsToProcess[1:2])
# Calculate summaries
summ.HO2LV <- summary(chains2LV)
rm(chains2LV.r, chains2LV.r2, chains2LV.r3)
Environment first:
chainsPlot(chains2LV, var = c("B"), legend = F, traceplot=TRUE)
And the trait effects:
chainsPlot(chains2LV, var = c("O"), legend = F, traceplot=TRUE)
The traceplots look OK, although B[2,1] looks bimodal. This is not ideal, but we can explain why we think it is happening below by looking at the standard deviations:
Ratio <- chains2LV$chain1[,"sd.LV[1]"]/chains2LV$chain1[,"sd.LV[2]"]
Ratio.u <- HO.2LV$chain1[,"sd.LV[1]"]/HO.2LV$chain1[,"sd.LV[2]"]
SDs <- summ.HO2LV$statistics[grep("sd", rownames(summ.HO2LV$statistic)),]
rownames(SDs) <- c(paste0("Latent Variable ", 1:2),
paste0("Site ", 1:2),
paste0("Species ", 1:2))
knitr::kable(SDs, digits=2, format = "html", table.attr = "style='width:70%;'")
| Mean | SD | Naive SE | Time-series SE | |
|---|---|---|---|---|
| Latent Variable 1 | 0.64 | 0.11 | 0 | 0.01 |
| Latent Variable 2 | 0.65 | 0.11 | 0 | 0.01 |
| Site 1 | 0.36 | 0.36 | 0 | 0.02 |
| Site 2 | 0.45 | 0.43 | 0 | 0.02 |
| Species 1 | 0.03 | 0.04 | 0 | 0.00 |
| Species 2 | 0.04 | 0.06 | 0 | 0.00 |
The two latent variables seem to explain similar amounts of variation, and a closer inspection (not shown) suggests that the second LV can explain more variance, which should not be happening.
In fact, if we plot B[2,1] and B[2,2] we see that the effect of the volume of lying CWD has an effect that flips between the first and second latent axes. The fact it is positive and negative suggests that the sign flipping is using a different variable, and is getting it wrong. Fixing the rotation should fix this.
plot(c(chains2LV$chain1[,"B[2, 1]"]), chains2LV$chain1[,"B[2, 2]"], type="n",
xlab="B[2, 1]", ylab="B[2, 2]")
abline(v=0, lty=3); abline(h=0, lty=3);
sapply(1:length(chains2LV), function(l, ch)
points(ch[[l]][,"B[2, 1]"], ch[[l]][,"B[2, 2]"], col=l, cex=0.3), ch=chains2LV)
Now, we can make two-dimensional ordination plots of sites and species, with their predictor effects. Note that these have been rotated, so that most ofthe variance should be on the first axis.
AddArrows <- function(coords, marg= par("usr"), col=2) {
origin <- c(mean(marg[1:2]), mean(marg[3:4]))
Xlength <- sum(abs(marg[1:2]))/2
Ylength <- sum(abs(marg[3:4]))/2
ends <- coords / max(abs(coords)) * min(Xlength, Ylength) * .8
arrows(
x0 = origin[1],
y0 = origin[2],
x1 = ends[,
1] + origin[1],
y1 = ends[, 2] + origin[2],
col = col,
length = 0.1)
text(
x = origin[1] + ends[, 1] * 1.1,
y = origin[2] + ends[, 2] * 1.1,
labels = rownames(coords),
col = col)
}
ExtractMeans <- function(mcmc.lst, name=NULL) { # this needs defensive programming
if(is.null(name)) {
ind <- 1:nrow(mcmc.lst$statistics)
} else {
ind <- grep(name, rownames(mcmc.lst$statistics))
}
Means <- mcmc.lst$statistics[ind,"Mean"]
if(any(grep(",", names(Means)))) {
Col <- gsub("]", "", unique(gsub(".*, ", "", names(Means))))
res <- sapply(Col, function(cc, mn) {
mn[grep(paste0(cc,"]"), names(mn))]
}, Means)
} else {
res <- Means
}
res
}
GetMeans <- function(summ, name, d=1) {
if(is.null(d)) d <- 1
var <- summ$statistics[grep(paste0('^', name), rownames(summ$statistics)),"Mean"]
matrix(var,ncol=d)
}
# LVMeans <- ExtractMeans(LVs, name = "^LV\\[")
# gammaMeans <- ExtractMeans(gamma, name="^gamma")
z.m <- GetMeans(summ.HO2LV, name="^z", d=consts$nLVs)
gamma.m <- GetMeans(summ.HO2LV, name="^gamma", d=consts$nLVs)
vareps.m <- GetMeans(summ.HO2LV, name="^varepsilon", d=consts$nLVs)
eps.m <- GetMeans(summ.HO2LV, name="^epsilon", d=consts$nLVs)
B.m <- GetMeans(summ.HO2LV, name="B", d=consts$nLVs)
O.m <- GetMeans(summ.HO2LV, name="O", d=consts$nLVs)
par(mfrow=c(2,1))
#LVs
plot(z.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Sites")
text(z.m,labels=1:consts$NSites)
AddArrows(marg = par("usr"), coords=B.m, col="red")
#gammas
plot(gamma.m,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Species")
text(gamma.m,labels=vegan::make.cepnames(colnames(Y)))
AddArrows(marg = par("usr"), coords=O.m, col="blue")
We can see that almost all of the environmental effects seem to act in the same way, mainly on LV1.
We can also look at some summary statistics for the scale parameters of the latent variables, species-specific residuals and site-specific residuals, respectively:
These tell us how well the predictors explain the ordination; the species- and site-specific scale parameters would be zero if the predictors fully explained the ordination. The scale parameters for the latent variables are similar to singular values (the square root of eigenvalues) in a classical ordination; they reflect a dimension its importance to the response.
The covariance of species is determined by traits, LV-specific variation and sites’ residual variation, and the covariance between sites is determined by the environment, LV-specific variation and species’ residual variation.
The residual covariance matrix of the model (where you calculate species associations from) is determined by three terms:
where \(\mathcal{K}_0\) denotes the zero order modified Bessel function of the second kind. The first term tells us that correlations between species are determined by the environment. The second term tells us that the correlation between sites is determined by species traits. The last term induces no correlation between sites and species for diagonal \(\boldsymbol{\Sigma}\), but serves to scale species associations. Consequently, the covariance for species \(j,j2\) and site \(i,i2\) is: \[\begin{multline} \Sigma_{i,i2,j,j2} = \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \\ \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j},\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2})+ \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}), \end{multline}\] third order terms are zero for central normal random variables, so this simplifies to: \[\begin{equation} \Sigma_{i,i2,j,j2} = \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2}+ \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2})\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}). \end{equation}\] Here, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}) = \text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2}) = 0\), and for \(\text{cov}(\boldsymbol{\epsilon}_i^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) = 2\text{tr}(\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma})\). Further, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2})\) is zero for \(j \neq j2\) and \(\text{diag}(\boldsymbol{\delta}^2)\) otherwise, and similar for \(\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2})\) .
Consequently, for a species association matrix where we consider the block of the covariance matrix where \(i = i2\), or for the site-to-site matrix where we consider the block \(j = j2\), we have:
\[\begin{split} \Sigma_{j,j2} &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}) + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{var}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} \\ &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2}+ 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}, \end{split}\]and
\[\begin{equation} \Sigma_{i,i2} = \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j} + \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}. \end{equation}\]We can visualize those matrices here for (e.g.,) the first site and first species, respectively.
#species
spec.mat <- matrix(0,nrow=NSpecies,ncol=NSpecies)
Sigma <- diag(SDs[grep("Latent Variable", rownames(SDs)),1]^2)
Sigma.sp <- diag(SDs[grep("Species", rownames(SDs)),1]^2)
Sigma.si <- diag(SDs[grep("Site", rownames(SDs)),1]^2)
for(j in 1:NSpecies){
for(j2 in 1:NSpecies){
if(j==j2){
spec.mat[j,j2] = X[1,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%t(B.m)%*%t(X[1,,drop=F])
}
spec.mat[j,j2] = spec.mat[j,j2] + TR[j,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%Sigma%*%t(O.m)%*%t(TR[j2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
}
}
spec.cor.mat <- cov2cor(spec.mat)
colnames(spec.cor.mat) <- row.names(spec.cor.mat) <- colnames(Y)
corrplot::corrplot(spec.cor.mat,type = "lower",order = "AOE", main = "Species", mar = c(1,1,1,1),tl.srt=45,tl.cex = .5)
#sites
site.mat <- matrix(0,nrow=NSites,ncol=NSites)
for(i in 1:NSites){
for(i2 in 1:NSites){
if(i==i2){
site.mat[i,i2] = TR[1,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%t(O.m)%*%t(TR[1,,drop=F])
}
site.mat[i,i2] = site.mat[i,i2] + X[i,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%Sigma%*%t(B.m)%*%t(X[i2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
}
}
site.cor.mat <- cov2cor(site.mat)
colnames(site.cor.mat) <- row.names(site.cor.mat) <- paste("Site", 1:NSites)
corrplot::corrplot(site.cor.mat,type = "lower",order = "AOE", main = "Sites", mar = c(3,3,3,3),tl.srt=45,tl.cex = .5)